Goto

Collaborating Authors

 efficient estimate


Efficient estimates of optimal transport via low-dimensional embeddings

arXiv.org Artificial Intelligence

Optimal transport distances (OT) have been widely used in recent work in Machine Learning as ways to compare probability distributions. These are costly to compute when the data lives in high dimension. Recent work aims specifically at reducing this cost by computing OT using low-rank projections of the data (seen as discrete measures) [Paty and Cuturi, 2019]. We extend this approach and show that one can approximate OT distances by using more general families of maps provided they are 1-Lipschitz. The best estimate is obtained by maximising OT over the given family. As OT calculations are done after mapping data to a lower dimensional space, our method scales well with the original data dimension. We demonstrate the idea with neural networks. We use Sinkhorn Divergences (SD) to approximate OT distances as they are differentiable and allow for gradientbased optimisation. We illustrate on synthetic data how our technique preserves accuracy and displays a low sensitivity of computational costs to the data dimension.


Rejoinder: Learning Optimal Distributionally Robust Individualized Treatment Rules

arXiv.org Machine Learning

We thank the opportunity offered by editors for this discussion and the discussants for their insightful comments and thoughtful contributions. We also want to congratulate Kallus (2020) for his inspiring work in improving the efficiency of policy learning by retargeting. Motivated from the discussion in Dukes and Vansteelandt (2020), we first point out interesting connections and distinctions between our work and Kallus (2020) in Section 1. In particular, the assumptions and sources of variation for consideration in these two papers lead to different research problems with different scopes and focuses. In Section 2, following the discussions in Li et al. (2020); Liang and Zhao (2020), we also consider the efficient policy evaluation problem when we have some data from the testing distribution available at the training stage. We show that under the assumption that the sample sizes from training and testing are growing in the same order, efficient value function estimates can deliver competitive performance. We further show some connections of these estimates with existing literature. However, when the growth of testing sample size available for training is in a slower order, efficient value function estimates may not perform well anymore. In contrast, the requirement of the testing sample size for DRITR is not as strong as that of efficient policy evaluation using the combined data. Finally, we highlight the general applicability and usefulness of DRITR in Section 3.